111 research outputs found

    Listening to features

    Get PDF
    This work explores nonparametric methods which aim at synthesizing audio from low-dimensionnal acoustic features typically used in MIR frameworks. Several issues prevent this task to be straightforwardly achieved. Such features are designed for analysis and not for synthesis, thus favoring high-level description over easily inverted acoustic representation. Whereas some previous studies already considered the problem of synthesizing audio from features such as Mel-Frequency Cepstral Coefficients, they mainly relied on the explicit formula used to compute those features in order to inverse them. Here, we instead adopt a simple blind approach, where arbitrary sets of features can be used during synthesis and where reconstruction is exemplar-based. After testing the approach on a speech synthesis from well known features problem, we apply it to the more complex task of inverting songs from the Million Song Dataset. What makes this task harder is twofold. First, that features are irregularly spaced in the temporal domain according to an onset-based segmentation. Second the exact method used to compute these features is unknown, although the features for new audio can be computed using their API as a black-box. In this paper, we detail these difficulties and present a framework to nonetheless attempting such synthesis by concatenating audio samples from a training dataset, whose features have been computed beforehand. Samples are selected at the segment level, in the feature space with a simple nearest neighbor search. Additionnal constraints can then be defined to enhance the synthesis pertinence. Preliminary experiments are presented using RWC and GTZAN audio datasets to synthesize tracks from the Million Song Dataset.Comment: Technical Repor

    Principled methods for mixtures processing

    Get PDF
    This document is my thesis for getting the habilitation à diriger des recherches, which is the french diploma that is required to fully supervise Ph.D. students. It summarizes the research I did in the last 15 years and also provides the short­term research directions and applications I want to investigate. Regarding my past research, I first describe the work I did on probabilistic audio modeling, including the separation of Gaussian and α­stable stochastic processes. Then, I mention my work on deep learning applied to audio, which rapidly turned into a large effort for community service. Finally, I present my contributions in machine learning, with some works on hardware compressed sensing and probabilistic generative models.My research programme involves a theoretical part that revolves around probabilistic machine learning, and an applied part that concerns the processing of time series arising in both audio and life sciences

    Informed Source Separation from compressed mixtures using spatial wiener filter and quantization noise estimation

    No full text
    International audienceIn a previous work, we proposed an Informed Source Separation sys- tem based on Wiener filtering for active listening of music from un- compressed (16-bit PCM) multichannel mix signals. In the present work, the system is improved to work with (MPEG-2 AAC) com- pressed mix signals: quantization noise is estimated from the AAC bitstream at the decoder and explicitly taken into account in the source separation process. Also a direct MDCT-to-STFT transform is used to optimize the computational efficiency of the process in the STFT domain from AAC-decoded MDCT coefficients

    Proof of Wiener-like linear regression of isotropic complex symmetric alpha-stable random variables

    Get PDF
    This document features supplementary materials to the reference paper [1]. It provides the proof of equation (8) in [1]. This proof concerns a particular regression property of complex isotropic symmetric alpha-stable random variables (see [2]). In [1], this property is shown paramount in building efficient filters for separating symmetric alpha-stable processes. Such processes exhibit very large dynamic ranges while being locally stationary, and have been shown appropriate for audio modeling

    Generalized Wiener filtering with fractional power spectrograms

    Get PDF
    International audienceIn the recent years, many studies have focused on the single-sensor separation of independent waveforms using so-called soft-masking strategies, where the short term Fourier transform of the mixture is multiplied element-wise by a ratio of spectrogram models. When the signals are wide-sense stationary, this strategy is theoretically justified as an optimal Wiener filtering: the power spectrograms of the sources are supposed to add up to yield the power spectrogram of the mixture. However, experience shows that using fractional spectrograms instead, such as the amplitude, yields good performance in practice, because they experimentally better fit the additivity assumption. To the best of our knowledge, no probabilistic interpretation of this filtering procedure was available to date. In this paper, we show that assuming the additivity of fractional spectrograms for the purpose of building soft-masks can be understood as separating locally stationary alpha-stable harmonizable processes, alpha-harmonizable in short, thus justifying the procedure theoretically

    An overview of informed audio source separation

    Get PDF
    International audienceAudio source separation consists in recovering different unknown signals called sources by filtering their observed mixtures. In music processing, most mixtures are stereophonic songs and the sources are the individual signals played by the instruments, e.g. bass, vocals, guitar, etc. Source separation is often achieved through a classical generalized Wiener filtering, which is controlled by parameters such as the power spectrograms and the spatial locations of the sources. For an efficient filtering, those parameters need to be available and their estimation is the main challenge faced by separation algorithms. In the blind scenario, only the mixtures are available and performance strongly depends on the mixtures considered. In recent years, much research has focused on informed separation, which consists in using additional available information about the sources to improve the separation quality. In this paper, we review some recent trends in this direction

    A diagonal plus low-rank covariance model for computationally efficient source separation

    Get PDF
    International audienceThis paper presents an accelerated version of positive semidef-inite tensor factorization (PSDTF) for blind source separation. PSDTF works better than nonnegative matrix factoriza-tion (NMF) by dropping the arguable assumption that audio signals can be whitened in the frequency domain by using short-term Fourier transform (STFT). Indeed, this assumption only holds true in an ideal situation where each frame is infinitely long and the target signal is completely stationary in each frame. PSDTF thus deals with full covariance matrices over frequency bins instead of forcing them to be diagonal as in NMF. Although PSDTF significantly outperforms NMF in terms of separation performance, it suffers from a heavy computational cost due to the repeated inversion of big covariance matrices. To solve this problem, we propose an intermediate model based on diagonal plus low-rank covariance matrices and derive the expectation-maximization (EM) algorithm for efficiently updating the parameters of PSDTF. Experimental results showed that our method can dramatically reduce the complexity of PSDTF by several orders of magnitude without a significant decrease in separation performance. Index Terms— Blind source separation, nonnegative matrix factorization, positive semidefinite tensor factorization, low-rank approximation

    OOPS: une approche orientée objet pour l'interrogation et l'analyse linguistique de l'interface prosodie/syntaxe/discours

    Get PDF
    International audienceDans cet article, nous nous intéressons à la problématique de l'étude de la langue parlée multi-annotée. Dans de tels corpus, un même échantillon de parole est associé à des informations inhérentes à différents niveaux linguistiques. Cela soulève des problématiques liées à la difficulté d'organisation, de stockage et d'accès à ces informations pour l'analyse conjointe de niveaux linguistiques : intonosyntaxe, discours-prosodie et syntaxe-pragmatique par exemple. La principale difficulté qui sous-tend l'exploitation d'un tel corpus multi-annoté de langue parlée est la mise en relation d'unités qui appartiennent à des niveaux linguistiques différents. Pour tous les niveaux linguistiques représentés, chaque annotation conduit en effet à une hiérarchie particulière. L'agrégation de toutes ces hiérarchies ou arborescences linguistiques est l'enjeu du formalisme proposé. Pour étudier l'interface entre différents niveaux linguistiques, nous proposons une approche orientée objet OOPS (Object-Oriented Processing of Speech) permettant de représenter une large variété d'annotations au sein d'une architecture globale. Une telle structure ne peut en effet pas être réalisée entièrement à partir de la seule transcription annotée, qui est au mieux exploitable par un humain. Elle nécessite au contraire une mise en relation du signal et des autres support d'annotation avec cette transcription pour l'étude conjointe d'unités linguistiques appartenant à des niveaux différents. La particularité de l'aproche que nous proposons est qu'elle repose entièrement sur un formalisme modulaire, ou objet. Une unité linguistique sera vue comme un objet (au sens informatique du terme) de la hiérarchie dépendant du niveau linguistique auquel elle appartient. Ces différentes hiérarchies sont reliées par les mots de la transcription, qui leurs sont communs. Ainsi, il devient possible de faire des requêtes mettant en jeu plusieurs niveaux linguistiques : syntaxe-prosodie, syntaxe-pragmatique ou encore prosodie-pragmatique, pour en extraire toute information jugée pertinente. L'approche que nous proposons repose sur le postulat que plus l'information sera modulaire, plus son traitement en sera simple et puissant. Cette hypothèse nous a conduit à envisager certaines structures sous un angle un peu différent de celui proposé par les membres du projet Rhapsodie dans un soucis de toujours plus modulariser l'information linguistique. D'un point de vue beaucoup plus pratique, le système que nous décrivons dans cet article a été développé sous la forme d'un module Python permettant l'analyse et l'exploitation de données annotées selon le système mis en place dans le cadre du projet Rhapsodie (Lacheret, Kahane & Pietrandrea (eds) à paraître)

    Reference-less measurement of the transmission matrix of a highly scattering material using a DMD and phase retrieval techniques

    Get PDF
    This paper investigates experimental means of measuring the transmission matrix (TM) of a highly scattering medium, with the simplest optical setup. Spatial light modulation is performed by a digital micromirror device (DMD), allowing high rates and high pixel counts but only binary amplitude modulation. We used intensity measurement only, thus avoiding the need for a reference beam. Therefore, the phase of the TM has to be estimated through signal processing techniques of phase retrieval. Here, we compare four different phase retrieval principles on noisy experimental data. We validate our estimations of the TM on three criteria : quality of prediction, distribution of singular values, and quality of focusing. Results indicate that Bayesian phase retrieval algorithms with variational approaches provide a good tradeoff between the computational complexity and the precision of the estimates
    corecore